safe model
- North America > United States > Pennsylvania (0.04)
- Asia > China > Zhejiang Province (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- North America > United States > Pennsylvania (0.04)
- Asia > China > Zhejiang Province (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
\texttt{R$^\textbf{2}$AI}: Towards Resistant and Resilient AI in an Evolving World
Sun, Youbang, Wang, Xiang, Fu, Jie, Lu, Chaochao, Zhou, Bowen
In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose \textit{safe-by-coevolution} as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce \texttt{R$^2$AI} -- \textit{Resistant and Resilient AI} -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. \texttt{R$^2$AI} integrates \textit{fast and slow safe models}, adversarial simulation and verification through a \textit{safety wind tunnel}, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.
- Information Technology (0.93)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (3 more...)
Exclusive: New Research Shows AI Strategically Lying
For years, computer scientists have worried that advanced artificial intelligence might be difficult to control. A smart enough AI might pretend to comply with the constraints placed upon it by its human creators, only to reveal its dangerous capabilities at a later point. Until this month, these worries have been purely theoretical. Some academics have even dismissed them as science fiction. But a new paper, shared exclusively with TIME ahead of its publication on Wednesday, offers some of the first evidence that today's AIs are capable of this type of deceit. The paper, which describes experiments jointly carried out by the AI company Anthropic and the nonprofit Redwood Research, shows a version of Anthropic's model, Claude, strategically misleading its creators during the training process in order to avoid being modified.
Randomization Techniques to Mitigate the Risk of Copyright Infringement
Chen, Wei-Ning, Kairouz, Peter, Oh, Sewoong, Xu, Zheng
In this paper, we investigate potential randomization approaches that can complement current practices of input-based methods (such as licensing data and prompt filtering) and output-based methods (such as recitation checker, license checker, and model-based similarity score) for copyright protection. This is motivated by the inherent ambiguity of the rules that determine substantial similarity in copyright precedents. Given that there is no quantifiable measure of substantial similarity that is agreed upon, complementary approaches can potentially further decrease liability. Similar randomized approaches, such as differential privacy, have been successful in mitigating privacy risks. This document focuses on the technical and research perspective on mitigating copyright violation and hence is not confidential. After investigating potential solutions and running numerical experiments, we concluded that using the notion of Near Access-Freeness (NAF) to measure the degree of substantial similarity is challenging, and the standard approach of training a Differentially Private (DP) model costs significantly when used to ensure NAF. Alternative approaches, such as retrieval models, might provide a more controllable scheme for mitigating substantial similarity.
TSIS: A Supplementary Algorithm to t-SMILES for Fragment-based Molecular Representation
Wu, Juan-Ni, Wang, Tong, Tang, Li-Juan, Wu, Hai-Long, Yu, Ru-Qin
String-based molecular representations, such as SMILES, are a de facto standard for linearly representing molecular information. However, the must be paired symbols and the parsing algorithm result in long grammatical dependencies, making it difficult for even state-of-the-art deep learning models to accurately comprehend the syntax and semantics. Although DeepSMILES and SELFIES have addressed certain limitations, they still struggle with advanced grammar, which makes some strings difficult to read. This study introduces a supplementary algorithm, TSIS (TSID Simplified), to t-SMILES family. Comparative experiments between TSIS and another fragmentbased linear solution, SAFE, indicate that SAFE presents challenges in managing long-term dependencies in grammar. TSIS continues to use the tree defined in t-SMILES as its foundational data structure and encoding logic, which sets it apart from the SAFE model. The performance of TSIS models surpasses that of SAFE models, indicating that the algorithm of the t-SMILES family provides certain advantages.
Can Copyright be Reduced to Privacy?
Elkin-Koren, Niva, Hacohen, Uri, Livni, Roi, Moran, Shay
Recent advancements in Machine Learning have sparked a wave of new possibilities and applications that could potentially transform various aspects of our daily lives and revolutionize numerous professions through automation. However, training such algorithms relies heavily on extensive content, either annotated or generated by individuals who may be impacted by these algorithms. Consequently, the identification and determination of when and how content can be used within this framework without infringing upon individuals' legal rights have become a pressing challenge.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Iowa (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Media (1.00)
- Law > Intellectual Property & Technology Law (1.00)
- Government (1.00)
- (2 more...)